Dynamic Multi-Sensor and Multi-Robot Interface System

ABSTRACT

An adaptive learning interface system for end-users for controlling one or more machines or robots to perform a given task, combining identification of gaze patterns, EEG channel&#39;s signal patterns, voice commands and/or touch commands. The output streams of these sensors are analysed by the processing unit in order to detect one or more patterns that are translated into one or more commands to the robot, to the processing unit or to other devices. A pattern learning mechanism is implemented by keeping immediate history of outputs collected from those sensors, analysing their individual behaviour and analysing time correlation between patterns recognized from each of the sensors. Prediction of patterns or combination of patterns is enabled by analysing partial history of sensors&#39; outputs. A method for defining a common coordinate system between robots and sensors in a given environment, and therefore dynamically calibrating these sensors and devices, is used to share characteristics and positions of each object detected on the scene.

CROSS REFERENCE

This application is a divisional application from U.S. patentapplication Ser. No. 14/941,879 file on Nov. 16, 2015 and entitledDynamic Multi-Sensor and Multi-Robot Learning Interface System, whichclaims the benefit of U.S. Provisional Patent Application No. 62/080,353filed on Nov. 16, 2014 and entitled Dynamic Multi-Sensor and Multi-RobotLearning Interface System.

TECHNICAL FIELD

The present invention pertains to adaptive learning interface system forcontrolling and operating robotic devices. More particularly the presentinvention pertains to such controlling and operating of robotic deviceswith human produced signals.

FIELD AND BACKGROUND OF THE INVENTION

The present invention, in some embodiments thereof, relates to a dynamicmulti-sensor and multi-machine calibration method.

Computer and robotic applications that include some level of interactionwith the physical environment, require estimating and sharing the 2D or3D coordinates of objects detected in the given environment. When morethan one sensor or actuator is involved, it's critical to have thesecoordinates translated to a relative position with respect to thereceiving device or sensor. For example, if an object is detected afteranalysing an output image from a depth sensor, in order to have a robotgrasp this object, those coordinates must be translated to a relativeposition with respect to the robot itself as opposed to the camera thatdetected it. When multiple sensors and robots are combined, obtainingthese equivalent coordinates becomes very complex.

Many industrial applications of robotic machines include roboticmachines working together as a robotic team. These teams of roboticmachines include machines of multiple designs, manufacturers andgenerations and are many times assisted by external sensors to determinethe location of objects in the environment. In order to have eachmachine understand where a given object is located, the objects'coordinates must be converted to a relative position with regards to ofthe participating machines. In static environments such as theindustrial one described, this requires some setup effort to determinethe relative position of each sensor and robot in the system withrespect to each other. After this initial effort and considering thatthe base of the robots and the sensors are in a fixed position, theinitial data is used to make the required calculations for sharingcoordinate data between robots and sensors.

In a less static environment, an initial setup effort to estimate theserelative positions is not enough as both robots' bases and sensors mightchange positions in an untraceable way, and therefore the relativepositions between them will not always be the same.

While requiring setup effort, there are many methods to translatecoordinates between devices in the static scenario described above. Fora dynamically changing scenario, where sensors and robots change theirrelative position with regards to each other, the problem is morecomplicated and there are no practical and dynamical methods availablethat solve this in a simple and timely manner.

In other embodiments of the current invention, the methods describedabove are used as a system to assist physically impaired patients whocan demand actions from a robot combining one or more gesturemechanisms: Eye gaze, voice, gestures, EEG signals, touch, and others. Arobot is set to assist a paralyzed or weak patient in manipulating orpicking objects around a scene. A method is defined to enable suchpatients to control a robotic arm through gaze. A camera pointed towardsan end-user's eye is able to track the direction the pupil is directedto while one or more external sensors is able to detect the object inthe scene that the user is observing. A special human-machine interfacesystem is able to translate gaze movements into requests by the end-userand translate this into actions that the robot can perform on sceneobjects.

SUMMARY OF THE INVENTION

According to some embodiments of the present invention, there isprovided a method for defining a common coordinate system between robotsand sensors in a given environment. The method comprises collecting asequence of a plurality of images showing at least one sensor and/or onemechanical device with an internal coordinate system, such as a roboticarm, performing an analysis of the sequence of a plurality of images toidentify the sensor or machine in the scene and to determine itsrelative position with regards to the sensor from where the sequence ofimages were collected, and creating dynamic functions and parametermatrixes required to transform any given coordinates from/to thegenerating sensor to/from the detected sensor or machine.

Optionally, the sequence of a plurality of images is collected by atleast one depth map sensor.

Optionally, the sequence of a plurality of images is collected by atleast one digital camera.

Optionally, the sensor detected in the image is an eye tracker placed infront of an end-user's eye.

Optionally, the generating sensor is an eye tracker with one or moreusers' eyes in its field of view.

Optionally, the detected machine is a robotic arm.

Optionally, the detected machine is an unmanned vehicle.

Optionally, the detection algorithm is adjusted to recognize roboticmachines of different types, makes, and models.

Optionally, the generating sensor or the detected one is mounted on anend-user in order to determine the end-user's relative position withregards to one or more robots and/or sensors in the environment.

Optionally, the collection and analysis of sequences of images isperformed dynamically on a video stream input.

Optionally, the collection and analysis is done on a sequence ofdepth-maps and it is performed dynamically on a depth map input stream.

Optionally, other objects are detected in the sequence of images ordepth-maps, and their 2D or 3D coordinates are calculated relative toone or more sensors' or machines' coordinates systems and shared betweenthem.

Optionally, a new virtual coordinate system is created and sharedbetween the generating-images device and the rest of the sensors and/ormachines detected on the images, or between the controllers of thesedevices.

Optionally, all the above is performed in parallel by analysing asequence of a plurality of images collected from two or more sensors inthe environment.

According to some embodiments of the present invention, there isprovided an end-user interface for controlling one or more machines orrobots to perform a given task required by an end-user. The end-user cancontrol the robots' actions by moving his gaze to the direction of anobject in the environment, then selecting through the interface a givenobject and the action to be perform on it/with it. The method comprisesgathering gaze position data from a plurality of images collected from asensor that has one or more end-users' eyes in its field of view, thenenabling a selection capability of the given object by detecting eitheran eye blink of predetermined time length, or a predetermined gazegesture, and highlighting the object on the screen for feedbackpurposes. Then, an option selection is enabled by showing a list ofavailable actions on screen and allowing the end-user to scroll throughthem by directing his/her gaze in a given directions. Options arehighlighted on screen as gaze movements are detected in the givendirection. Finally, a selection capability is enabled by detecting theend-user's blink for a predetermined lapse of time, or by detecting apredetermined gaze gesture, while an option is highlighted on screen. Acomplementary external sensor or camera can optionally be used todetermine the coordinates of the object selected with respect to therobotic arm and/or with respect to the end user's eyes. At this point, aprocessing unit transfers the coordinates of the selected objects to therobot, translated to coordinates that are relative to the robot itselfand based on the first method described above. It also transfers thetype of actions selected by the end-user to be performed by the robot onthe object or with the object.

Optionally, instead of selecting an action to be performed on theobject, the end-user can select in the same way the option ofcontrolling the robot or robots movements through gaze. From this pointon, and until deselecting this option, the robot will moveproportionally in the same direction that the gaze detection algorithmidentifies that the pupil or eye centre moved.

Optionally, instead of selecting an action to be performed on theobject, the end-user can select in the same way an option of controllingthe robots' movements through gaze in a “joystick-mode”. From this pointon, and until deselecting this option, the robot will moveproportionally in the same direction that the gaze detection algorithmdetects while the user's gaze is displaced beyond a predetermined pixelsradius from a pre-calibrated home position.

Optionally, a scalar factor is used to calculate the proportionaldisplacement in space to be performed by the robot with respect to theproportional displacement of the end-users' pupil.

Optionally, this scalar factor can be edited by the end-user, enablinghim to change the distance that the robot will be displaced in space foreach pixel displacement on a gaze detection track. This acts as a sortof virtual gear for the robot being controlled, where step movements canbe increased or decreased by a scale factor.

Optionally, the scalar factor mentioned above is used to define a fixstep that the robot will be displaced in the direction of the gazedisplacement,

Optionally, a gaze-gesture recognition algorithm enables the end-user toindicate selections and actions. These gestures include, but are notlimited to, vertical, horizontal and diagonal movements at certainspeeds and with a displacement gap that is bigger than a predeterminedthreshold. Gestures also include detection of movements of gaze incircle-like trajectories in clock-wise and counter clock-wisedirections.

According to some embodiments of the present invention, there isprovided an apparatus associated with a multi-modal controller. Theapparatus comprises at least one processing unit and one or more sensorsof images and/or depth maps and/or sounds and/or voice and/or EEG and/ortouch. The outputs of these sensors are analysed by the processing unitin order to detect one or more patterns on inputs from one or moresensors that are translated into one or more commands to the robot, tothe processing unit or to other devices. A pattern learning mechanism isimplemented by keeping history of outputs collected from those sensors,analysing any apparent pattern on these outputs and analysing timecorrelation between patterns recognized from each of the sensors. Theend-user can then visualize those patterns and their interrelation, anddefine a command or sets of commands to be executed each time similarpattern combinations are detected in the future.

Optionally, patterns are detected by fitting geometrical shapes totrajectories created by tracking relative displacement of the end-users'eye centers. For example, detecting a circular type of movement, orlinear type of movement and its direction.

Optionally, patterns are detected by fitting geometrical shapes totrajectories of other body parts such as finger tips, hands, headorientation and others.

Optionally, patterns are pre-recorded and used to identify end-usersrequests.

Optionally, a mechanical device such as a robot is connected to thecontroller. Commands detected trough the patterns system described aboveare translated into actions that this device will execute.

Optionally, other electrical devices such as lights, appliances or otherelectrical-powered artefacts are connected to the controller. Commandsdetected trough the patterns system described above are translated intoactions that this device will execute.

Optionally, a predictive method is implemented that anticipates thepattern or combination of patterns to be generated by analysing partialhistory of sensors output. For example, if patterns were detected anddefined based on a set of 50 consecutive images from an input videocamera, or from a collection of images acquired during 5 seconds ofvideo history, a prediction method is implemented to detect potentialfuture pattern based on only last 20 consecutive images or on last 2seconds of video history. If it's circular-like movement tracked fromthe end-users eye center position, detecting half circle on partialhistory track activates a prediction that translates into a predictedcommand corresponding to the circle-like type of shape in historytracking.

According to some embodiments of the present invention, there isprovided an apparatus associated with at least one sensor. The apparatuscomprises at least one spatial sensor for collecting a sequence ofplurality of images, at least one processing unit which analyses asequence of a plurality of images to identify other sensors, robots,end-users parts and/or objects, at least one processing unit whichanalyses a sequence of a plurality of images to identify the position ofat least one additional sensor, end-user or robot, at least one storageunit to save the sequence of a plurality of images and the analysisresults, at least one processing algorithm, the at least one coordinatesconversion algorithm and parameters, and the at least one technicalusage characteristic, at least one digital communication method tocommunicate the coordinates and conversion algorithms and parameters tothe at least one associated robotic machine or other devices, at leastone digital communication method to communicate the at least onetechnical usage characteristic to at least one statistical analysissystem, and housing for containing the at least one spatial sensor, theat least one processing unit, the storage unit, and the communicationmethod, the housing being configured for being suitable for theenvironment of the robotic machine in the cases where a robotic machineis included.

Optionally, the at least one spatial sensor consists of at least onedepth map sensor.

Optionally, the at least one spatial sensor consists of at least onedigital camera.

Optionally, the at least one digital communication method is any digitalcommunication method typically or atypically used in the workingenvironment of the associated at least one robotic machine, including,but not limited to, wired Ethernet, wireless Ethernet, serialcommunication standards, parallel communications standards, infraredcommunications, wired universal serial bus, wireless universal serialbus, Bluetooth communication, or any type of typical or atypicalindustrial automation protocol.

Unless otherwise defined, all technical and/or scientific terms usedherein have the same meaning as commonly understood by one of ordinaryskill in the art to which the invention pertains. Although methods andmaterials similar or equivalent to those described herein may be used inthe practice or testing of embodiments of the invention, exemplarymethods and/or materials are described below. In case of conflict, thepatent specification, including definitions, will control. In addition,the materials, methods, and examples are illustrative only and are notintended to be necessarily limiting.

Implementation of the method and/or system of embodiments of theinvention may involve detection of objects, sensors and/or machinesmanually, automatically, or a combination thereof. Moreover, accordingto actual instrumentation and equipment of embodiments of the methodand/or system of the invention, several selected options could beimplemented by hardware, software or firmware or by a combinationthereof using an operating system.

For example, hardware for performing analysis tasks according toembodiments of the invention could be implemented as a chip or acircuit. As software, selected tasks according to embodiments of theinvention could be implemented as a plurality of software instructionsbeing executed by a computer using any suitable operating system. In anexemplary embodiment of the invention, one or more tasks according toexemplary embodiments of method and/or system as described herein areperformed by a data processor, such as a computing platform forexecuting a plurality of instructions. Optionally, the data processorincludes a volatile memory for storing instructions and/or data and/or anon-volatile storage, for example, a magnetic hard-disk and/or removablemedia, for storing instructions and/or data. Optionally, a networkconnection is provided as well. A display and/or a user input devicesuch as a keyboard or mouse are optionally provided as well.

BRIEF DESCRIPTION OF THE DRAWINGS

Some embodiments of the invention are herein described, by way ofexample only, with reference to the accompanying drawings. With specificreference now to the drawings in detail, it is stressed that theparticulars shown are by way of example and for purposes of illustrativediscussion of embodiments of the invention. In this regard, thedescription taken with the drawings makes apparent to those skilled inthe art how embodiments of the invention may be practiced.

FIG. 1 is a schematic representation of the dynamic coordinateconversion system between devices. Illustrates an end user equipped withan eye-tracking camera, EEG and microphone, robot and external cameras.

FIG. 2 is a schematic representation of a camera, robot and object to bemanipulated. Illustrates how each object/device has its on pose andcoordinates, which are shared and then processed to obtain commonspecial coordinate system.

FIG. 3 is a schematic representation of pattern recognition of gazegesture.

FIG. 4 is a schematic representation of end-user requesting commandsthrough a combination of voice, gaze, EEG patterns.

FIG. 5 is a schematic representation of the feedback on screen toend-user on gaze control and gesture recognition

FIG. 6 is a schematic representation of the Dynamic Multiple-DevicesCalibration work flow.

FIG. 7 is a schematic representation of selection between sets of twoaxes in order to control robot or other mechanical devices with gaze.

FIG. 8 is a schematic representation of 3 axes being controlled throughgaze, one of the axes being defined by variation in pupil diameter intime.

DESCRIPTION OF EMBODIMENTS OF THE INVENTION

According to some embodiments of the present invention, there isprovided a method for defining a common coordinate system between robotsand sensors in a given environment. The method comprises collecting asequence of a plurality of images showing at least one sensor and/or onemechanical device with internal coordinate system, such as a roboticarm, performing an analysis of the sequence of a plurality of images toidentify the sensor or machine in the scene and to determine itsrelative position with regards to the sensor from where the sequence ofimages was collected, creating a dynamic function and parametersrequired to transform any given coordinates from/to the generatingsensor to/from the detected sensor or machine.

Optionally, the plurality of images are complemented by correspondingdepth maps or other special correlating matrices. The analysis includesobject recognition algorithms to detect the rest of the sensors and/orrobots in the environment.

Optionally, special stickers or markers such as chessboards or barcodesor other recognizable IDs are placed over each of the sensors on theenvironment and/or over the robot's end points in order to facilitatethese algorithms. Orthogonal unit vectors representing the axes of theobjects coordinate system can be derived from these special stickers.They assist in describing the rotation of its coordinates with respectto the generating sensor.

Optionally, the stickers or markers above are placed on a skin sectionor sections of an end-user, or in clothes or devices worn by theend-user—in order to determine the end-users' relative position withregards to one or more robots and/or sensors in the environment.

Optionally, color segmentation can be used in each image in order tolocate other sensors or mechanical devices. This will be useful inenvironments where colors of sensors and/or robots are not present inother objects.

Optionally, shape or contrast segmentation can be used where the shapesof robots and/or sensors are unique in the environment.

Optionally, a combination of the above can be used for segmentationpurposes in determining the presence and location of other sensorsand/or robots in the environment.

FIG. 2 shows a schematic illustration where the location and pose of theend point of the robot is detected by analysing the external camera'simages or depth maps frames. Detection mechanism might be facilitated byadding a special graphic identifier to the robot's end-point (i.e. chessboard sticker or printout, QR code other graphic signs) and/or might bedetected directly by algorithms that identify and locate the robot'sgripper or end-point. Additionally, coordination algorithms might beimplemented to facilitate the process where the sensor locates theend-point in a given robot location, then the robot moves and again adetection and calculation is performed from an image on the newposition. The figure includes:

-   201—External camera with robot arm and scene objects on field of    view-   202—Robotic arm-   203—Robotic arm's gripper-   204—Special identifier of robotic end-point pose for location and    identification by other sensors-   205—Objects on scene to be manipulated by robot

Reference is now made to FIG. 6, a flowchart of one embodiment describedherein for clarification, according to some embodiments of the presentinvention. FIG. 6 shows an illustration of the workflow to perform forthe multiple devices' and sensors' calibration. On an initial step (1),a collection of depthmaps, images and/or other spatial data arecollected. On step 2, recognition algorithms are run over the framescollected from the sensors, which are read as BGR matrices and/or depthmatrices and/or other type of sensing data matrices, to find othersensors in the field of view and/or mechanical devices such as robots.Search algorithms might be assisted by adding identifiable stickers oneach sensor with an encoded ID. These stickers might be chessboardprintouts, barcodes, actual letters and numbers that are recognizedthrough OCR algorithms, or others. Once the corners of the chessboardsare detected in step (4), vectors are constructed for x,y and zcoordinates relative to the x,y,z coordinates 0,0,0 of the generatingsensor. This matrix might include a parallel shift (i.e.x+xdisplacement, y+ydisplacement, z+displacement) and/or a rotationalangle displacement (i.e. degrees of rotation for x coordinate withrespect to x coordinate of generating sensor, degrees of rotation for yand degrees of rotation for z). This information is stored and usedanytime (step 8) an object is detected by any of the sensors and itneeds to be translated to coordinates of another sensor in theenvironment or a robot or mechanical device.

On an initial step (1), a collection of depth maps, images and/or otherspatial data are collected. On step (2), algorithms are run over thematrixes collected to find other sensors in the field of view and/ormechanical devices such as robots. Search algorithms might be assistedby adding identifiable stickers on each sensor with an ID encoded. Thesestickers might be a chessboards printout, barcodes, characters andnumbers that can be identified through OCR algorithms or other uniquelyidentifiable signs. Once the corners of the corners of the chessboardare detected in step (4), vectors are constructed for x,y and zcoordinates relative to x,y,z coordinates 0,0,0 of the generatingsensor. This matrix might include a parallel shift (i.e.x+xdisplacement, y+ydisplacement, z+zdisplacement) and/or a rotationalangle displacement (i.e. degrees rotation for x coordinate with respectto x coordinate of generating sensor, degrees rotation for y and degreesrotation for z). This information is stored and used on demand (step 8)each time an object is detected by any of the sensors and it needs to betranslated to coordinates of another sensor in the environment or arobot or mechanical device.

Once a sensor is detected and its position determined with respect tothe generating sensor—the one from which the images where collected —, aconversion matrix is built to allow the transformation of coordinatesfrom one device to and from the other. This is done by calculating aparallel shift of each axis (x,y,z in 3D or x,y in 2D), and calculatingrotation angles and direction for each of these axis. The parallel shiftand rotation angles' parameters are saved and used to transformcoordinates between sensors and/or machines in the environment. Forexample, if a sensor detects an object, it will determine the x,y,zcoordinates of the object within the sensors' coordinate system (whereusually the sensor's position is 0,0,0 in its coordinate system). Then,when a robot is required to perform an action on this object, atransformation of coordinates of the object is performed towards thecoordinate system of the robot itself. This transformation typicallyutilizes knowledge of the relationship between the sensor and the robot.

Optionally, the matrixes and/or parameters mentioned above describe asingle rotation about some axis according to Euler's rotation theorem,using three or more real numbers.

Optionally, quaternions are used to represent the translation androtation of the detected sensor and/or mechanical device with respect tothe generating sensor.

For example, analysis of a frame 0 of a video-stream+depth map of agiven sensor can identify the location of a second sensor, a gripper,robotic arm end-point or other devices on the scene. A rotation unitquaternion is calculated based on, for example, three given points thatare co-planar to the detected device. Then, the detected device sharesits 3D location based on its own 3D coordinate system. For example, arobotic arm can share where the end-point is located according to itsown XYZ coordinate system and can also share the rotation of the endpoint represented as a rotation angle around each of its base axes. Alater frame will again detect the position of these three points in thegenerating sensor's coordinate system. If the position of one or more ofthese points changed according to the generating sensor's coordinatesystem, the processing unit can estimate the rotation and translation ofthe robotic end point with respect to the previous location on robotcoordinates by calculating Q×P×Conjugate of Q. Q being the inversequaternion of the rotation quaternion defined by the three robotend-point points detected in the previous frame with respect to theplanes of the sensor, normalized as a unit quaternion and P being thequaternion that represents the delta displacement from previous frame tocurrent frame. The resulting matrix is used to increase/decrease eachrobot axis coordinate value, shared in previous frame, in order to knowthe robot equivalent coordinates to the camera/sensor ones. The rotationof the endpoint in robot coordinates is calculated by robQ×CamQ, whererobQ is the unit quaternion representing the robot endpoint rotation inthe original frame, expressed in robot coordinate system as rotationsaround each of its base axes, and CamQ is the unit quaternionrepresenting the delta rotation of the three detected points withrespect to the previous frame in camera coordinates. Pre-equivalencebetween axes might be setup by end-user by defining, for example, that Xaxis coordinate in the sensor's coordinate system will be equivalent toZ axis in the robot coordinate system.

Optionally, the method described above can be used to dynamically andwithout user intervention, calibrate a robotic gripper and one or morecameras in the environment.

Optionally, the method described above can be used to control a roboticarm and bring its gripper and/or endpoint to a given location and in agiven rotation calculated based on the camera or camera's coordinatesystem. In this case, instead of determining the location and rotationfrom identification of the gripper points in a later frame, these pointsare calculated according to the processing unit software in the sensors'coordinate system and the method above is used to convert these sensorcoordinate values into robot coordinate values.

According to some embodiments of the present invention, there isprovided an end-user interface for controlling one or more machines orrobots or electrical devices to perform a given task required by anend-user. See FIG. 1 for illustration. FIG. 1 is a schematicrepresentation of a multi-sensor environment where an end-user selectsobjects through gaze and requests actions to be performed by a robotarm. The illustrated system calculates a unified coordinate systembetween all the devices in a dynamic way, enabling each of these devicesto know the relative position of the other devices in a given scene.This is accomplished by either creating a new universal coordinatesystem to which and from which any device can map its own location andthe locations of the other devices. On an alternative mechanism, thereis a device selected as “master”. Every device can map its own locationand the locations of other devices through an intermediate conversion tothe master device coordinates. The identification of each of the sensorsand/or machines in the environment is assisted, in this particularillustration, by adding chessboard signs or other graphic signs that arerecognized by image detection algorithms and from which a spatialposition can be derived (x,y or x,y,z). Other unique visual signs can beused instead. FIG. 1 shows:

-   101—End user-   102—Eye tracker sensor or camera, visually identifiable from    external sensors such as the sensor (5) in illustration-   103—Special graphical identifier for unique object pose and location    identification by other devices-   104—External camera or sensor with other devices in its field of    view-   105—Robot with any combination and quantities of arms, legs, wheels    and other actuators and sensors-   106—Cameras or sensors mounted on robot

The end-user can control the robots' actions by moving his gaze to thedirection of an object in the environment, then selecting through theuser interface a given object and the action to be perform on it/withit. The method comprises gathering gaze position data from a pluralityof images collected from a sensor that has one or more end-users' eyesin its field of view, and position data from one or more sensors in theenvironment where the object and the end-user's gaze tracking device areat least at one of their fields of view, then enabling a selectioncapability of the given object by detecting either an eye blink ofpredetermined time length, or a predetermined gaze gesture, andhighlighting the object on the screen for feedback purposes. Then, anoption selection is enabled by showing on screen to the end-user a listof available actions and allowing the end-user to scroll through them bydirecting his/her gaze in a given direction or using any other pointingcomputing device. Options are highlighted on screen in response todetected gaze movements in the given direction. Finally, a selectioncapability is enabled by detecting the end-user's blink for apredetermined length of time, or by detecting a predetermined gazegesture in the tracking history of the end-user's pupil's center, whilean option is highlighted on screen. Optionally, at this point, aprocessing unit transfers the coordinates of the selected objects to therobot, converted to coordinates that are relative to the robot itselfand based on the first method described above. It also transfers thetype of actions selected by the end-user to be performed by the robot onthe object or with the object. FIG. 5 illustrates the kind of feedbackthe end-user is presented with in order to let him know the directionthe mechanical device will move to and the amount of displacementselected. In a joystick-type of control, the robot will move equaldistances in periodic lengths of time in the direction selected. Whenthe eye is back to its centre, the robot will stop moving. An arrowhighlighting direction and displacement gap is optionally displayed onthe screen in order to let the end-user know his current selection.Optionally, images from sensors on the environment or placed on therobot's end point will be overlapped on top of the eye location image,or either of them will be displayed alone.

Optionally, the end-user's pupil home position is set by enabling theend user to select the “set home position” option while the pupil isdetected in certain image position.

Optionally, the end-user's pupil home position is set automatically bykeeping track of the pupil's position on the images at the initial stageand for a given length of time, and making an average of where the pupilwas detected on the image matrix (i.e. BGR matrix retrieved fromsensor).

Optionally, gaze gestures are detected by keeping a history of thepupil's centre coordinates for the last set of images or for a givenlength of time, in a dynamic way.

Optionally, pupil's trajectories that are detected as being similar toan ellipse or a circle by,—for example—using fitEllipse function orHoughCircles function of OPENCV library, and that are moving in aclockwise direction, are interpreted as increase command or as scrollingcommand in one direction.

Optionally, pupil's trajectories that are similar in shape to acircumference and moving in non-clockwise direction are interpreted as adecrease command or as a scrolling command in the one direction. FIG. 3illustrates this scenario. FIG. 3 is a schematic representation of threedifferent locations of pupils detected by looking for dark circle-likepatterns within a predetermined range of diameters. The locations andgestures detected are translated into end-user selections and/or roboticmovements.

-   301—Pupils are detected by searching for circle-like dark shapes    that fit within predetermined diameter limits. Pupils' home    positions are set.-   302—Pupil displacement is calculated by detecting pixel differences    between home position pupil's center and current image pupil's    center. Direction and pixel displacement are translated into a robot    movement direction and distance to be performed. Movement speed is    calculated by detecting the pixel displacements of the centers on    each image and using the timestamp of each of the image frames used.-   303—Gaze gestures are recognized by keeping a history of the pupil's    center detected through multiple frames, and analyzing trajectory    shapes. In this illustration, a counter-clock circle type of    trajectory is detected after several sequential images are analyzed    and the pupil center in each of them is detected.

A home position is set in 301. In 303, circles 1 to 8 illustrate thepositions where the pupil was tracked in the last 8 frames. An ellipsetype of shape and a counter-clock direction is detected in the trackedhistory.

Optionally, pupil's trajectories that are similar in shape to straightlines are interpreted as scrolling command in that direction.

Optionally, using selection options described above, an end-user canincrease or decrease the distance that the robot will move for eachstep. This will work as a virtual-gear, where a given pixel displacementof the pupil's centre is translated into a given spatial displacement ofthe robot's end point multiplied or divided by a factor that theend-user selects.

Optionally, using selections options described above, an end-user canincrease or decrease the distance that a cursor on screen will move toindicate each step. This will work as a virtual-gear, where a givenpixel displacement of the pupil's centre is translated into a givenspatial displacement of the cursor on screen multiplied or divided by afactor that the end-user selects.

Optionally, a camera is placed on a mechanical device and/or robot. Thedirection of the pupil's movement is translated into movements that themechanical device performs in the same direction. The images from thiscamera are then transferred back to the end-user. This enables theability to explore an environment visually by moving the pupil towardsthe direction where the end-user wants to expand and explore. If theend-user, for example, moves his gaze rightwards far from the pupil'shome position, then the mechanical device will move rightwards andimages of the camera mounted on it will be transmitted showing anadditional portion of the environment towards the right of the previousfield of view.

Optionally, when controlling a robot with gaze an end user can switchbetween sets of coordinates and see on screen the robot and optionallythe object being moved by retrieving images of another sensor thatoffers this image. This is illustrated in FIG. 7. This figureillustrates the ability of an end-user to switch between sets of twoaxes in order to control the robot with gaze. By moving between panesshown on screen, as the ones in the figure, the end user can, forexample, translate x,y of its gaze coordinates to y,z of the robot or toy,x of robot, or to any other sets of two coordinates. For example,around a robotic arm, two cameras can be placed—one with a view of therobot from above and one with a lateral view of the same robot. The enduser can switch between them with gaze selection or other means, and cancontrol the robot in different sets of 2D coordinates. For example, fromabove it could control the robot's x, y coordinates while from the sideit could control the robot's y, z coordinates.

Optionally, an option is enabled to the end-user through gaze gesturesallowing him to switch between sets of 2D coordinates of a givenmechanical device, and then control with gaze that device on thosetarget coordinates, with or without visual feedback from sensors aroundthat device. See FIG. 7.

Optionally, a 3D coordinate system is implemented where x,y coordinatesare obtained from the row and column of the detected pupil's center inthe image, while the z coordinate is calculated based on the diameter ofthe pupil detected or its relative variations. See FIG. 8 for anillustration of this scenario. This figure illustrates a method wherethe pupil's trajectory direction on screen controls the robots movementsin two coordinates, while a third coordinate is controlled bycalculating the difference in pupil's diameter. Detecting difference indiameter of pupils, in time, a Z displacement is calculated for robot ora more/less distant object Is selected in the same X,Y line of view. X,Yof robot is calculated based on X,Y displacement of pupil on capturedimages. Same approach can be used to locate an object that the end-useris looking at or wants to select.

Optionally, the pupil diameter change is used to calculate a spatialdifference for one of the coordinate axis. For example,increase/decrease on pupil's diameter can be interpreted asincrease/decrease in z coordinates.

Optionally, x,y coordinates on screen are compensated for the 3Dcircularity of the users' eyeball.

Optionally, an axis of eye pupil can be transformed to a different axison the robot or machine to be controlled by a selection of the end-user.

According to some embodiments of the present invention, there areprovided methods and devices for robotic machine task learning throughrecording and reproduction of end-users' commands through one or moreselection method described above. End-users commands are storedsequentially or in parallel and then replicated on demand.

Optionally, an option is enabled to allow the end-user to save therobot's current position in any given time on its local coordinatessystem, and create a robot trajectory between the saved positions thatcan be ran later on by request of the end-user selection.

Optionally, the controller analyses the direction where the end-user islooking in the environment, then through the coordinates transformationsystem described above, identifies this object's location from the pointof view of an external sensor. Then these coordinates are converted toany of the devices' or sensors' coordinates systems for future actions.

According to some embodiments of the present invention, there isprovided an apparatus associated with a robotic controller. Theapparatus comprises at least one processing unit and one or more sensorsof images and/or depth maps and/or sounds and/or voice and/or EEG and/orand/or touch. The outputs of these sensors are analysed by theprocessing unit in order to detect one or more patterns on inputs fromone or more sensors that are translated into one or more commands to therobot, to the processing unit or to other devices. A pattern learningmechanism is implemented by keeping history of outputs collected fromthose sensors, analysing any apparent pattern on these outputs andanalysing time correlations between patterns recognized from each of thesensors. The end-user can then visualize those patterns and theirinterrelation, and define a command or sets of commands to be executedeach time similar pattern combinations are detected in the future.

Optionally, sensors connected to the controller produce raw data such asbit-map images, EEG signals per channel and sound.

Optionally, one or more devices connected to the controller producepre-processed data. For example, an Emotiv EEG device pre-detectscertain commands based on EEG channels, and/or Primesense's sensorsidentify gestures and produce notifications of these gestures and/orcellphone devices are able to recognize words pronounced by theend-user. The proposed controller then takes these inputs into accountand produces a combined pattern that is later used by the end-user togenerate a command or sets of commands. If the word “Do” is detected bythe cellphone just after a particular command was detected at the EEGemotive device and just before the end-user created a given gaze signal,a new pattern is defined and the end-user can associate a command tothis pattern. Optionally, each time the same sequence of events isrecognized the controller will perform the selected command.

Optionally, patterns are detected by fitting geometrical shapes totrajectories created by tracking relative displacement of the end-users'eye centres. For example, detecting a circular type of movement, orlinear type of movement and its direction. fitEllipse HoughCirclesfunction of OpenCV can be used in order to enable this option, byrunning them on the backward recorded positions. This tracking mechanismrecords to memory or disk the position where the centre of the pupil oreyes was detected in each frame and the time when the frame wasacquired, among other useful data. The history buffer is pre-set tostore a pre-defined set of eye center positions. Optionally, the historybuffer is set by a pre-defined time period. For example, detected centreof eyes are recorded and analysed dynamically for the last 10 secondswith regards to the current frame. A FIFO queue is implemented for theserecords. FIG. 4 is an illustration of multiple patterns recognized frommultiple sensors in a sequential way. An end-user is equipped with anEEG device such as EMotiv, an eye tracker such as a glass mounted CMOSIR illuminated micro camera and a microphone or sound input device(401). In some cases the sound input device can be a cellphone. Adetecting pattern algorithm runs on the controller of these devices anddetects a sequence of patterns between time 0, time 1 and time 2. Ontime 0 (402), an eye movement was detected towards the right, on time 1,a voice signal was detected and recognized—optionally—as a voice commandwhile in parallel a decrease in one of the EEG channels signal wasdetected, and on time 2 a circular gesture was detected while trackingthe eye pupil's centre. Partial or complete combinations of thesedetected patterns are presented to the e-end user and can be associatedwith specific commands in future actions enabling this way a learningmechanism of combination of gestures by end user from multiple sensors.These commands can be an action on a robot, controlling light or otherelectrical devices or selecting options on a computer screen forexample.

Optionally, the end-user's eye center is detected by fitting an ellipseof predefined minimum and maximum diameter to darker areas of an imagecollected from sensors that is located close to the end-user's eye.Using an IR illuminated black and white CMOS camera or equivalent, forexample, the pupil will be the darkest section on the image.

Optionally, patterns are detected by fitting geometrical shapes totrajectories of other body parts such as finger tips, hands, headorientation and others.

Optionally, patterns are pre-recorded and used to identify end-users'requests.

Optionally, a mechanical device such as a robot is connected to thecontroller. Commands detected through the patterns system describedabove are translated into actions that this device will execute.

Optionally, other electrical devices such as lights, appliances or otherelectrical-powered artefacts are connected to the controller. Commandsdetected through the patterns system described above are translated intoactions that this device will execute.

Optionally, a predictive method is implemented that anticipates thepattern or combination of patterns to be generated by analysing partialhistory of sensors' output. For example, if patterns were detected anddefined based on a set of 50 consecutive images from an input videocamera, or from a collection of images acquired during 5 seconds ofvideo history, a prediction method is implemented to detect potentialfuture pattern based on only last 20 consecutive images or on last 2seconds of video history. If it's circular-like movement tracked fromthe end-users eye center position, detecting half circle on partialhistory track activates a prediction that translates into a predictedcommand corresponding to the circle-like type of shape in historytracking.

Optionally, the methods and embodiments described above are used as asystem to assist physically impaired patients who can demand actionsfrom a robot combining one or more gesture mechanisms: Eye gaze, voice,gestures, EEG signals, touch, and others.

Optionally, the methods and embodiments described above are used tocontrol a robot remotely through the Internet or other communicationmeans.

Optionally, the methods and embodiments described above are used tocreate a semi-automatic robotic system where the end-user highlightsobjects on the screen based on images collected from the system'ssensors, offering feedback on the objects identified and theirlocations.

What is claimed is:
 1. A method for controlling a robot arm through gazecomposed of at least one sensor acquiring images of one or moreend-user's eyes, one or more processing units or controllers and one ormore robotic devices; receiving at least one sequence of a plurality ofimages imaging a human performing at least one gesture by moving atleast one eye; performing an analysis of said sequence of a plurality ofimages to identify said at least one gesture; and identifying anassociation between said at least one gesture and at least one commandto be executed; and Instructing a robotic machine with said at least onecommand.
 2. The method of claim 1, wherein the processing unit estimatesthe position of a given object being observed by the end-user byanalyzing the detected position of the pupil in the eye-tracker cameraand the eye-tracker's camera location and pose relative to an externalsensor in the environment, the current gaze direction located bydetecting the pupil's center, and detecting an object in this line ofview in the external sensor's 3D image.
 3. The method of claim 2,wherein said at least one external sensor is at least one depth mapsensor.
 4. The method according to claim 2, wherein said visual data iscomplemented or replaced with audible data, touch sense data, and/or EEGdata and any combinations thereof, said data produced by said humanoperating body.
 5. An apparatus associated with at least one processingunit comprising: at least one sensor for collecting a sequence of imagesof at least one human eye; at least one processing unit configured foranalyzing a sequence images from at least one of said sensor to identifygestures performed by moving eyes in certain directions, determining thecenter of the eye in each frame and determining at least one command toperform based on moving pattern, and generating at least one processingunit command; at least one storage unit configured for saving saidsequence of a plurality of images and detected at least one command,said at least one command of said processing unit, at least oneprocessing algorithm and processed motion of said robotic device; atleast one digital communication means configured for communicating saidat least one command to said at least one associated processing unit;and housing configured for containing said at least one spatial sensor,said at least one processing unit, said storage unit and saidcommunication unit.